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COMPARISON OF TWO TREATMENTS WHEN THERE MAY BE AN INITIAL EFFECT 
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Abstract 

Consider situations where the treatment may cause an initial effect 
and may also cause a long-range effect. We want to evaluate the treatment, 
or to compare two treatments, when the effect of treatment may result from 
the two distinct mechanisms, and M^. We may wish to evaluate M^ 
and M 2 separately, but we may also want to evaluate their combined effect 
M^. Examples are given and the general results are applied to the special 
case arising in weather modification studies and elsewhere: the possible 
effects are multiplicative and the distribution of nonzero variables is 
Gamma with at most the scale parameter affected by treatment. An example 
demonstrates that the two components may be too weak to be judged significant 
while their sum is large and significant. The locally optimum C(a) test is used. 

There is a brief discussion of the power function of the tests. The 
asymptotic power agrees well, in general, with the results of the Monte 
Carlo simulation for the test of the combined effect. If the zero 
values are discarded and then Z 2 employed, there is large bias in the 
power. The bias is more pronounced if the Wilcoxon, Mann-Whitney test is 
employed. Notice that the two effects under study may be acting in the 
same direction or they may be in opposition. 

TREATMENTS WITH TWO MECHANISMS, NEYMAN C(a) TESTS, POWER FUNCTION, 

GAMMA DISTRIBUTION, MULTIPLICATIVE EFFECT 
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1. Introduction 

We consider situations where the treatment may cause an initial effect 
and may also cause a long-range effect. There are many examples. Mosteller 
(1977) described the Portacaval Shunt Operation "designed to reduce pressure 
from the blood stream in the esophagus and thus prevent or stop hemorrhaging 
in the patient. The operation has a substantial death rate". Other treatments 
and also nontreatment have a substantial death rate. The Portacaval Shunt 
Operation may (1) affect the probability of surviving the initial period of 
treatment and also may ( 2 ) affect the number of years of survival of those 
patients who do live through the operation. The two effects may be in the 
same direction or in opposite directions. Although both effects are of 
concern to the patient and his physician, the combined effect is also important. 

We want to evaluate the treatment, or to compare two treatments, when 
the effect of treatment may result from two distinct mechanisms, denoted 
by and M 2 , say. Mechanism M^ consists in the possible modification 
of the probability of an initial effect of treatment. The hypothesis 
that no such effect occurs will be denoted by Then, mechanism M 2 

consists in a change in the conditional distribution of the variable under 
study, say Y, given that Y > 0. The hypothesis that M 2 is not operating 
will be denoted by Hj. We may wish to evaluate M-| and M 2 separately, 
but we may also want to estimate their combined effect M3, the total 
change per experimental unit. The distinction between the mechanisms 
M! and M 2 and their combined effect M 3 is often ignored. This may 
be unfortunate since the separate effects of the initial mechanism and of 
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the long-range mechanism may be weak and therefore difficult to detect 
while their combined effect may be important and capable of detection. 

On the other hand, M-j and Mg may be in opposition so that they tend 
to cancel each other. In this case, an analysis of either alone may be 
misleading. 

A second example where the treatment may act through two mechanisms 
arises in the treatment of cancer (and other diseases). M-j could alter 
the probability of unpleasant side effects that could force the patient 
to withdraw from treatment and/or be fatal. Mg may affect the conditional 
expected length of survival, given that the patient continues treatment. 

As an illustration, for some diagnoses of cancer, the standard treatment 
is a harsh chemical program which some patients cannot withstand. A new 
treatment consists of the administration of a transfer factor designed 
to increase the patient's immunity to his/her specific kind of cancer. 

The statistician consulted on such an experiment may want to compare the 
treatments by comparing the performance of all patients assigned to one 
treatment with the performance of all patients assigned to the other. 

However, the physician may feel that those patients who withdrew from 
treatment early in the experiment, for whateever reason, have been 
administered so little treatment that their inclusion would not be 
meaningful and would tend to dilute the results. Actually, the statistician 
wants to study mechanism M 3 and the physician wants to study Mg. 

In many examples, the distribution of survival time is nonstandard. 

A further complication arises when the experimental units are not homogeneous. 
In the example above, the patients may differ with respect to age, sex. 
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level of diagnosis, and so forth. These characteristics of the unit 
may serve as predictor variables for the experimental variable under 
study. 

Speaking generally, we consider a randomized experiment with independent 
units. For the k-th unit, let 

\ = the predictor variable with probability density f(x|<;A) where 
X and \ may be vectors; 

T k = 1 if the new treatment is applied, 0 if not, with Pr{T=l} * tt; 

Y k = the experimental variable with probability density p[y|x, 9 (t, 5 )] 
of known form, vector parameters. 

We assume that the effect of treatment enters through £, as follows. If 
9-(t,g) = 9. when T * 0 , then when T * 1 with the same value x, we have 

J J 

( 1 . 1 ) 0 j (t,$) ■ 9j + + o(s), for j = 1 , ..., s. 

We thus have a triplet (X,T,Y) for each experimental unit, with 
probability density, say, 

(1.2) '}'(x, t ,y) = 7 T t (l-TT) 1 " t f(x;A)p[y|x, 9 (t,^)]. 

The hypothesis of no effect becomes the hypothesis 5 = 0. Neyman 
and Scott (1967) have found the locally optimum test of class C(a) to 
have as test criterion 



0.3) Z 
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where 



The criterion Z is asymptotically Normal(0,1) when £ = 0, and is 
noncentral Normal when c, f 0, with noncehtrality parameter equal to 
5 multiplied by the denominator of Z. 

Neyman and Scott (1965, 1967) noted that the test criterion (1.3) 
does not depend on the distribution f of the predictor variable X 
except that the denominator must take into account the variability of 
the predictor X as well as that of the experimental variable Y. 

Moran (1973) extended this result. 

In the situation of this paper, we may wish to consider three problems 
separately: 

1) We may want to estimate , the effect of M-|, or we may want to 
test that has no effect which would correspond to 5 ] = 0 , 

2) We may want to estimate ^, the effect of M 2 , or we may want to 
test that M 2 has no effect which would correspond to £3 = 0 , 

3) We may want to estimate the combined effect £ 3 , or we may want to 
test that M 3 has no effect which would correspond to £3 = 0 . 

We thus employ (1.3) to develop three test criteria Z-j, Z 2 , and Z 3 . 

A case of wide application is considered in the next section. 










- 6 - 


2 . 

«1V 


Case of multiplicative effect accompanied by a Gamma distribution 


In many applications, we can assume that if the effect occurs at 
all, it is multiplicative. Often, we can assume that the distribution 
of the nonzero variable* is Gamma with shape parameter unaffected by 
the treatment. We then have for the initial mechanism M, 

9(Cl) = 90 + ^) 

so that £-| measures the proportional improvement, 

(2-1) = [9(5-,) - 93/Q- 

For mechanism M 2 , on combining the assumption that the nonzero 
effect is multiplicative with the assumption of a Gamma distribution with 
at most the scale parameter affected, we have 


(2.2) p y (y|y,6) = 


r(y) 


y'* 1 e" 5y , 


where y > 0 is the shape parameter and 5 > 0 is the inverse of the 
scale parameter. Under the assumption that treatment can affect only 
the scale parameter. 


(2.3) 


* t - «(c 2 ) = 



, t = treated (new treatment), 
u = untreated (standard). 


The analysis of weather modification experiments is an example 
where the assumptions of multiplicative effect and of a Gamma distribution 
of the nonzero effects are well satisfied. In fact, this application led to 
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the development of the problem (Neyman and Scott, 1967). Earlier analyses 
used designs with comparison areas under Normal theory (cf. Moran, 1955) 
and then under locally optimal C(a) theory (Neyman, Scott, and Vasilevskis, 
1960). Moran showed that the comparison area design with cross-over of 
treatment is advantageous when applicable. However, the effects of weather 
modification appear to be widespread causing contamination of the 
comparison area. Similar difficulties can arise in other types of application; 
we will restrict attention to randomized trials on homogeneous units. 

Cloud seeding, for example by the release of silver iodide into clouds 
in an effort to provide nuclei for the condensation of water vapor, could 
possibly cause precipitation to reach the ground which would not have 
fallen otherwise (or conversely). In addition, the cloud seeding may 
increase (or decrease) the amount of precipitation falling, given that there 
is some precipitation. Thus, we have a mechanism M] and a mechanism M 2 
that may be acting in the same or in opposite directions. The total effect 
depends on the combination of the two mechanisms. 

Meteorologists predict that both of the postulated mechanisms will be 
multiplicative. The distribution of nonzero precipitation is typically a 
Gamma distribution, and as illustrated in Figure 1, this approximation is 
reasonable even when the same shape parameter is employed for both the seeded 
and the not-seeded experimental units, at lease for similar types of storms. 

When the storm categories may differ, it is reasonable (Dawkins, Neyman, Scott, 
and Wells, 1977) to assume that the experimenters can predict the category 
before treatment starts, and before the randomized decision to treat or not 
treat the storm is made. For example, the experimenters can predict the 
duration D = d^ for the k-th experimental unit, and can assume that its effect 












Figure 1 


Typical comparison of observed distribution of nonzero precipitation 
with Gamma distribution fitted by maximum likelihood with same shape 
parameter. These data correspond to the six stations with altitude 
< 1000 km in zone 4 of the Swiss hail experiment Grossversuch III. 
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enters the distribution of untreated precipitation only through the 
shape parameter which can be approximated linearly, 

Y(d) • Aq + A,(d - d Q ) . 

Here, dp is the population mean of the variable predicted duration, 
and Ag, A-j, and 5^ are unknown 'nuisance parameters'. Notice 
that the differences in storm types, which may be complex, have a 
summary effect on the distribution of precipitation which may be 
quite large but can be summarized by changes in the shape of the 
distribution of nonzero precipitation expressed as a linear function 
of the predicted duration. Since the effect of seeding, if any, is 
assumed to alter only the scale parameter, we have that the conditional 
distribution of nonzero precipitation, given the predicted duration, 
is a Gamma distribution with constant shape parameter. 

Our experience indicates that similar assumptions can be made in 
other fields of application, for example in survival analysis for 
clinical trials. 

Under the assumptions and notation adopted, the expected value 
of the precipitation in a treated unit is 

(2.4) E(Y t ) = 9U-,) Ag / 6(5 Z ) = A Q 9 (l+C 1 )(l+5 2 ) / 

For a fixed value of D, that is, for a fixed category of storm types, 
the expected percent effect of seeding, due to both mechanisms, is 

(2.5) Percent effect = 100 [(1H ] )(1+5 Z ) - U = 100^ + + C^). 
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The theory of C(a) tests refers to certain limiting situations 
where the number of observations is "large" and the effects of 
treatment, such as ^ and ^> are "small". In consequence, we 
have tended to adjust the test of to be particularly sensitive 
to 


€ 1 + S 2 * n, say, 

neglecting the product term In typical weather modification 

experimentation (also in clinical trials), ^ might be 0.1 and 
might be 0.2 so that the neglected product is only 0 . 02 . 


3. Application 

The test criterion for the individual tests are found to be 
(Neyman and Scott, 1967a), using C(a) tests: 

For the hypothesis H-j that C 1 = 0, which means that the 
probability of initial effect (the probability of initiating 
precipitation) is not altered by treatment, corresponds to the 
familiar chi for this case: 
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Number of Initial Reactions 



Treated 

Untreated 

Total 

React 

n +t 

"+u 

V 

Do not 
react 

n 0t 

n 0u 

n o. 

Total 



n 


(3.1) Z 1 =[n +t n Qu - n +u n 0t J / (mr(l-ir) n +> ng.]^ 2 , 

where the notation is set out in the usual table and it is the 
adopted probability of treatment. The significance probability of 
Z 1 is the two-tailed Normal probability; reject H-j if |Z^|=v(ot), 
the critical value corresponding to level a. 

For the hypothesis H 2 that £ 2 = 0, which means that, given 
that the initial effect is survival (given that there is nonzero 
precipitation), there is no effect of treatment (the scale parameter 
in the Gamma distribution of nonzero precipitation is not altered), 
the test criterion turns out to be (Dawkins, Neyman, Hcott, and Wells, 1977) 

(3-2) z 2 = {[ t (6y k - 8(d k )| - £ u l6y k - «(d fe )l) / (nA^ , 

where the sums are taken over the t = treated and u = untreated 
units separately. The Ag, A-j , and 6 are solutions of the 








simultaneous maximum likelihood equations taken oyer all nonzero units, 
treated and untreated, 


^(y(d k )i - n A Q = Ilog g y k - n log & (£y k / n) , 
I(d k - d)4'(y(ci k )i = l(d k - d) log e y k , 

3 * n A Q / ly k . 


with d = £d k / n, the grand wean, and y is the derivative of the 
Gamma function. Also, 

Y( d k ) * A q + A 1 (d k - d). 

The significance probability of Z 2 is two-tailed Normal asymptotically. 

For the hypothesis H 3 that the combined effect of treatment is 
zero, we have as noted above been testing that + £3 ~ 
test criterion Z 3 is a weighted sum of and Z 2> 

(3.3) Z 3 = (A 1 Z 1 + A 2 Z 2 ) / (A* + A 2 ) h , 

with 

Af = 9/ (1-9) = n +> / n Q> , 

, 2 - A * 

“2 A 0 ’ 

which is the solution of the system of maximum likelihood equations 
when both <$ t and are entered as separate and possibly different 
parameters. The significance probability of Z 3 is two-tailed Normal 
asymptotically. 


The application of the three test criteria is illustrated in 


Table 1, referring to the evaluation of hail reports from the 
Grossversuch III hail suppression experiment in southern Switzerland 
(Sanger at al t 1958-64). An earlier analysis (Neyman, Scott, and 
Wells, 1966) of the effects of seeding on rainfall (which is easier 
to observe than hail) suggested that the effect of seeding is positive, 
with a significant increase in rainfall, when there are stability 
layers in the atmosphere, as indicated on the early morning nearest 
radiosonde observed at Milan. It is of interest, then, to study the 
effects of hail for this category of days. The results are shown in 
Table 1. The first rows of the table refer to the category of days 
'without stability layers' first for seeded (S) and then for nonseeded 
(NS) days. The next two rows refer to days 'with stability layers', 
and the last two rows to all days combined. The first block of results 
refers to mechanism -- is the frequency of days with hail altered 
by seeding? There is an indication of an increase of +54% for the 
category of days with stability layers, but the increase is not 
significant by the usual standards; the two-tail significance 
probability corresponding to the test criterion Z.j is only 0.093. 
There is no suggestion of change on days without stability layers. 

When we examine the second mechanism M^, we note that the amount 
of hail per day with hail appears to be increased by +47% but the 
effect is not significant, P now being 0.17 for the experimental days 
with stability layers on which there was hail, as estimated by the 
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asymptotic criterion 1^. When we continue to the totality of experimental 
days with stability layers and use Z 3 to evaluate the possible change 
in the number of hail reports per experimental day on which there are 
stability layers, we find an estimated increase of +127% with significance 
probability 0.024. Thus the combined effect of the two mechanisms is large 
and significant; they appear to be acting in the same direction. 

However, on experimental days without stability layers, the 
estimated effect is a small decrease, but it is far from significant. On 
all experimental days whatsoever, the estimated increase is positive 
+74% and P = 0.049 significant at the standard level. 

We thus have evidence that both mechanisms are playing a role: 
there is some evidence of an increase in the probability of hail and, 
given that there is hail reported, there is some evidence of an increase 
in the number of hail reports. As occurred when rainfall was the 
experimental variable, we find the positive effect is pronounced on the 
experimental days with stability layers. Since the purpose of the cloud 
seeding was to reduce hail , it appears that seeding with silver iodide 
is counter-indicated, at least as performed in this experiment, on days 
with stability layers. If the experiment is analysed using only days 
with positive hail reports -- comparing the hail counts on hail days 
that were seeded with those when there was no seeding but discarding 
the days with no hail reports (as is done with some operators) -- the 
estimated effects (as shown in the middle part of Table 1) would be much 
smaller, not significant, and possibly misleading. 
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4. Discussion of the power function 

The probability of detecting an effect when it exists is asymptotically 
noncentral Normal for each of the three test criteria. The power function 
of Z 3 , the criterion for combined effect of the two mechanisms, initial 
and long-range, is of particular interest. We examine briefly how this 
power surface depends on the two individual effects, on ^ and on ^ 

considered separately,and how it depends on their sum + £ 2 which 
is an approximation to the total effect ^ + 5 2 + When the tw0 

mechanisms are acting in the same direction, what is the power surface 
in a typical example, and how does this contrast with the power when the 
two mechanisms are acting in opposite directions? Is the asymptotic 
approximation for the power adequate with moderate sample sizes? 

Neyman and Scott (1967c) investigated the power of the locally optimum 
C(a) test criterion Z 2 for detecting a change in the effect £ 2 due 
to mechanism M 2 in a randomized experiment consisting of 100 independent 
trials, under the assumptions that the distribution is Gamma distributed 
with no predictor variables and that the treatment effect is multiplicative 
changing at most the scale parameter. The power functions of three 
nonparametric tests, the Wilcoxon, Mann-Whitney rank test, the Kolmogorov- 
Smirnov test and the median test, were studied at the same time since these 
tests are sometimes employed. The studies were made by Monte Carlo simulation 
for typical cases arising In weather modification experimentation, such 
as 9 = 0.8 for the untreated probability that precipitation will occur, 
and y * 0.6, <5 3 1.0 as the untreated parameters in the Gamma distribution. 

With n 3 100 experimental units, the power was discouragingly low for all 
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four tests. Even with level of significance 0.10, the probability of 
detecting a multiplicative effect of 1.5, corresponding to an increase 
of 50%, was slightly less than 0.6 for the locally optimum Z ^ test, 
and is lower still, about 0.45, for the Wilcoxon, Mann-Whitney test, 
and even smaller for the Kolmogorov-Smirnov and the median tests, a 
little more than 0.3 and a little less than 0.3, respectively. In these 
studies, the ordering of the power functions of these four tests was 
retained. 

In the Monte Carlo studies reported here, we have continued comparisons 
with the Wilcoxon, Mann-Whitney test, labelled U and drawn with a dashed 
line in the figures. Figure 2 gives a comparison of the Monte Carlo 
power of the locally optimum test criterion for testing the effect 
per experimental unit (solid line) as a function sum ^ with the 

asymptotic theoretical power (dotted line). In each panel the value 
of C] is fixed so that across a panel the value of ^ is increasing, 
negative at the left of the panel and positive at the right, with the 
point of changeover through zero shifting as is increased. The case 

considered is similar to that in the earlier paper except that 200 
experimental units are considered in the randomized trials since we now 
know that at least 200 trials are needed to achieve a reasonable experiment. 
The asymptotic power function provides a reasonable approximation for 
practical purposes except in those categories where 5-j is quite negative 
when the asymptotic power is too high especially when ^ is l flr 9 e positive. 

Figure 2 also shows the power function of the criterion Z ^ for 
comparison since, as noted above, some evaluations of the experiment have 
been made using only the positive observations. Unless S-| is near 
zero (the center panel), the disagreement with the power functions of Zj 








EFFECT PER EXPERIMENTAL UNIT 


CASE' y*0.6, S *1.0, 0*0.8; No. Treot. ♦ No. Not Treot * 200; No. Somples * 500; Level O.iO 

. Zj Asymptotic »_Zj, ^_ Zi, o_U Monte Corlo 



Figure 2 
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is pronounced. In particular, the power function of Z 2 has its 
minimum when £2 is zero, not when £-j + £ 2 is zero so that, unless 
Cl = 0, the use of Z 2 to test for combined effect produces a test bias 
which can be very large. When £, is negative, Z 2 has little chance 
of detecting that the combined effect is not zero when it is in fact 
negative, but has probability of several times the level of significance 
of finding that the effect is nonzero when it is actually zero. When the 
combined effect is positive, the power continues high. When £1 is 
positive, the power function of the criterion Z 2 is a reflection of 
that just described. We thus conclude that when and M 2 are acting 
in the same direction, so that and £ 2 have the same sign, the 

Z 2 test criterion has very little chance of detecting that the combined 
effect is not zero, even when the total of the two effects is quite large. 
However, when the two mechanisms are acting in opposite directions, the 
power of Z 2 is greater than that of Z^. Unfortunately, this 
phenomenon persists even when the combined effect is zero, making the 
test invalid unless £-j = 0 also. 

The power function of the Wilcoxon, Mann-Whitney is even more bizarre. 

As indicated by the short-dashed lines, the test bias is large unless £-j 
is near zero in which case the power function is much lower than that of 
competing tests. When and £ 2 have opposite signs, the power of 

the U test tends to be very low, approximately the level of significance. 

When the mechanisms are in the same direction the power increases but 
this is not helpful since in just these categories the U test is very 
invalid, with a large probability of finding a nonzero effect when none exists. 


-1 
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We would like a method for estimating the effectiveness of the 
combined mechanism, or for testing that the effect is zero, that is 
more powerful than the test criterion Z^. Several former colleagues 
in the Statistical Laboratory including Barry and Kang Ling James, S. 
Odoom, and Paul Wang are investigating these problems. Their studies 
are not yet completed and will be reported elsewhere. 
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LEGENDS 


Figure 1 

Typical comparison of observed distribution of nonzero precipitation 
with Gamma distribution fitted by maximum likelihood with same shape 
parameter. These data correspond to the six stations with altitude 
< 1000 km in zone 4 of the Swiss hail experiment Grossversuch III. 


Figure 2 

Power function for several tests that the combined effect per experimental 
unit is zero. Comparison of the asymptotic theoretical power for with 
Monte Carlo simulated power for Zj, for Z 2 , and for Wilcoxon, Mann-Whitney 
for fixed values of the initial effect ^ and increasing values of the sum 
of the two effects (and thus increasing values of the long-range effect S 2 ). 
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